Goto

Collaborating Authors

 cyberpunk 2077


Evaluating Long-Context Reasoning in LLM-Based WebAgents

Chung, Andy, Zhang, Yichi, Lin, Kaixiang, Rawal, Aditya, Gao, Qiaozi, Chai, Joyce

arXiv.org Artificial Intelligence

As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for providing personalized and contextually aware assistance. However, the performance of these agents in long context scenarios, particularly for action-taking WebAgents operating in realistic web environments, remains largely unexplored. This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents through sequentially dependent subtasks that require retrieval and application of information from extended interaction histories. We develop a novel evaluation framework that simulates multi-session user interactions by injecting irrelevant task trajectories between dependent subtasks, creating contexts ranging from 25,000 to 150,000 tokens. Through extensive evaluation of four popular models, Claude-3.7, GPT-4.1, Llama 4, and o4-mini, we observe a dramatic performance degradation as context length increases, with success rates dropping from 40-50\% in baseline conditions to less than 10\% in long context scenarios. Our detailed error analysis reveals that agents primarily fail due to getting stuck in loops and losing track of original task objectives. We further propose an implicit RAG approach that provides modest improvements by generating task-relevant summaries, though fundamental limitations in long context reasoning persist. These findings highlight critical challenges for deploying WebAgents in realistic, long-term user interaction scenarios and provide insights for developing more robust agent architectures capable of maintaining coherent task execution across extended contexts.


MSI Raider 18 HX AI review: A benchmark-breaking beast in laptop form

PCWorld

The MSI Raider 18 HX AI isn't a looker, but it packs incredible CPU and GPU performance. The MSI Raider 18 HX AI is the very model of a "desktop replacement" laptop. It's big, it's not much to look at, and it has a mediocre touchpad that implies users are really expected to connect a mouse. That might leave some shoppers asking, "What's the point?" That question is answered once the laptop is tossed into a demanding game or application. It might be thick, but the MSI Raider 18 HX delivers top-tier CPU and GPU performance. It even has gobs of RAM and a PCIe 5.0 solid state drive.


Nvidia GeForce RTX 5090 review: Brutally fast, but DLSS 4 is the game changer

PCWorld

Nvidia's GeForce RTX 5090 is the most brutally fast graphics card ever introduced, augmented by new DLSS 4 technology that feels like magic. But you pay dearly for it, and it feels like this GPU was designed more for AI researchers than PC gamers. The wait is finally over. The long-awaited GeForce RTX 5090 lands on store shelves in January -- and friends, the flagship graphics card for Nvidia's new "Blackwell" architecture is an absolute monster. It should be for 2,000, of course.


Nvidia's DLSS 4 is so much more than just 'fake frames'

PCWorld

This year at CES, Nvidia presented the next generation of its DLSS upscaling technology, which is trained with the help of artificial intelligence, alongside the new GeForce RTX 5090, 5080, and 5070 (Ti) graphics cards. The company touted its major advantages -- and now that RTX 5090 reviews are live, we can confirm that DLSS 4 indeed feels like black magic, supercharging frame rates and making games feel just as snappy as the beloved Doom 2016. That's because DLSS 4 now supports Multi Frame Generation (MFG), an AI-based multiple intermediate frame calculation that can artificially generate up to three images and insert them between two "real" frames, thus quadrupling the frame rate. Of course, this feature only works on new Blackwell-based RTX 50-series GPUs. But are the AI frames generated in this way a step forward or is it all hogwash?


The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

Feng, Ruili, Zhang, Han, Yang, Zhantao, Xiao, Jie, Shu, Zhilei, Liu, Zhiheng, Zheng, Andy, Huang, Yukun, Liu, Yu, Zhang, Hongyang

arXiv.org Artificial Intelligence

We present The Matrix, the first foundational realistic world simulator capable of generating continuous 720p high-fidelity real-scene video streams with real-time, responsive control in both first- and third-person perspectives, enabling immersive exploration of richly dynamic environments. Trained on limited supervised data from AAA games like Forza Horizon 5 and Cyberpunk 2077, complemented by large-scale unsupervised footage from real-world settings like Tokyo streets, The Matrix allows users to traverse diverse terrains -- deserts, grasslands, water bodies, and urban landscapes -- in continuous, uncut hour-long sequences. Operating at 16 FPS, the system supports real-time interactivity and demonstrates zero-shot generalization, translating virtual game environments to real-world contexts where collecting continuous movement data is often infeasible. For example, The Matrix can simulate a BMW X3 driving through an office setting--an environment present in neither gaming data nor real-world sources. This approach showcases the potential of AAA game data to advance robust world models, bridging the gap between simulations and real-world applications in scenarios with limited data.


Benchmarked: AMD's Ryzen AI 300 brings serious performance to Copilot laptops

PCWorld

What makes the PC the superior platform for personal computing? Sometimes that choice can be daunting though. Qualcomm's new Snapdragon X Elite just launched in Microsoft's debut Copilot_ laptops, coming out swinging against Intel's Core Ultra chip. With the launch of Ryzen AI 300-series today, you're now faced with a third choice. Fear not, as I've just finished testing performance testing of AMD's Ryzen AI 9 HX 370, its new flagship processor for Copilot laptops.


Bafta games awards hail one of gaming's best ever years

The Guardian

In London last night, the 20th Bafta games awards celebrated a year that was stacked with critically acclaimed games. Taking place against the backdrop of an unprecedented year of layoffs and studio closures in the gaming industry, acknowledged by Bafta chair Sara Putt in her speech at the beginning of the evening, it was a much-needed night of recognition of the creative efforts of the video game development community. The sprawling Dungeons & Dragons-inspired role-playing game Baldur's Gate 3 won five awards, including the public voted EE players' choice award and best game, alongside music, narrative and best performer in a supporting role (won by Andrew Wincott for his role at the devilish Raphael). Nintendo picked up the family and multiplayer awards for the exuberant Super Mario Bros Wonder, and technical achievement for The Legend of Zelda: Tears of the Kingdom. Alan Wake 2, the arresting, idiosyncratic horror game from Finnish studio Remedy, won artistic achievement and audio achievement.


CD Projekt Red used AI to include a deceased actor's voice in Cyberpunk 2077 DLC

Engadget

Cyberpunk 2077 developer CD Projekt Red has confirmed it used AI voice cloning software to reconstruct the voice of a deceased actor for its Phantom Liberty DLC. Actor Miłogost Reczek voiced the character Viktor Vektor in the Polish version of the game and would have been tapped to reprise the role for the DLC, which came out last month, but he died in 2021 before its production. The developer told Bloomberg it decided to go this route as a way to "pay tribute to his wonderful performance," and was given permission to do so by his family. Instead of replacing Reczek outright, CD Projekt Red worked with Respeecher, the Ukraine-based voice tech company known for deaging Mark Hamill's voice in The Mandalorian and The Book of Boba Fett to create a young Luke Skywalker. Another actor was hired to speak the new lines, and Respeecher's software reworked them into Reczek's voice, CD Projekt localization director Mikołaj Szwed told Bloomberg.


Cyberpunk 2077: Phantom Liberty review: The city you've been waiting to burn

PCWorld

Phantom Liberty is CD Projekt RED's masterpiece. Not only is Cyberpunk 2077 Phantom Liberty graphically easily three generations ahead of the entire industry and redefines how we experience video games with pathtracing, it's also written even more thrillingly and staged even more explosively. Anyone who doesn't enjoy this several times in different play styles has never loved video games. Cyberpunk 2077's Phantom Liberty expansion is a reminder of how incredibly explosive gaming has become – and the perfection with which CD Projekt RED manages to involve its actors. When Idris Elba is on a train out of Dogtown, joking with Songbird about how they really need to eat that one famous burrito of his together sometime, and there's such an eerie silence to the flirtation – the nervous looks of the head hacker because she's about to betray him – these are moments that feel like they'd belong in House of Cards or 24.


There's a live-action Cyberpunk 2077 show or movie on the way

Engadget

Developer CD Projekt Red just announced it is in the early stages of developing a live-action TV show or movie based on the once-hated and now-beloved Cyberpunk 2077 game. Details are scant, as we don't even know if it'll be a film or ongoing series, but the game developer has teamed up with production company Anonymous Content to bring Night City to glorious live-action life. You probably don't know Anonymous Content by name, but the company's behind a slew of high-profile and critically-acclaimed TV shows, like True Detective and Mr. Robot. It's also helped produce recent films like The Revenant and Spotlight, but also classics like Eternal Sunshine of the Spotless Mind and Being John Malkovich. This is a serious production company, so we could be in for something special.